Introduction
Our client is a Fortune 50 conglomerate operating in several divisions, including aerospace, power, renewable energy, digital industry, additive manufacturing, venture capital and finance. They were looking to incorporate a subscription model that would enable users to subscribe to data sources to relieve the stress on the client’s finance data lake while continuing to accommodate ad hoc requests.
The Challenge
The high costs and potential liability associated with the prevalence of inaccurate information
The client was unsatisfied with their ability to serve internal customers’ data needs and track the costs associated with data requests. Their system also made it difficult to provide accurate information for the chargeback process that takes place after a company in the conglomerate accesses data, the costs of which are billed to the central corporate office. The client also faced difficulties in achieving improved data access, transparency and accuracy in intra-conglomerate financial affairs with a unified audit log.
The Objective
Implement a cloud native solution for accurate and transparent data access
The client wanted to achieve the following:
- Implement a subscription model that would enable users to subscribe to various data sources and retrieve data from them according to a fixed schedule to relieve the stress on their finance data lake while continuing to accommodate ad hoc requests
- Charging users for the costs associated with data requests by providing accurate data on transactions
- The framework had to be based on cloud-native technology for flexibility and scalability, which meant abandoning the legacy ETL (Extract, Transform and Load) tool for the new solution
The Solution
Streamlined, more transparent and accurate accounting
HCLTech proposed a list of technologies and approaches to create data as a subscription framework.
- In line with the client’s requirements, the eventual solution was built on existing use cases but generic enough to easily accommodate new internal customers and demands
- The technology included Apache Spark as the analytics engine, AWS Glue as the ETL service providing custom-tailored jobs and Amazon RDS for PostgreSQL as the database engine
- Amazon S3 enabled measuring the amount of storage used for a request, while Glue made it easier to tag every job run with the name of the subscriber making the request and following the pricing of individual jobs, since each job that runs in Glue represents a separate instance
- As key AWS-native technologies of the solution clearly indicated the costs associated with a request, this combination made it easy to calculate exact chargeback values
The Impact
Improved conglomerate-wide access to finance data lake
The flexible framework built by our team enabled the client to serve subscribers with the data they need from their FDL in a fully auditable, metadata-driven manner and calculate the exact costs associated with each request — whether scheduled or ad hoc.
- This greatly streamlined the related accounting processes and made them more transparent, which helped the client avoid disputes about chargeback amounts
- To promote cost savings in shared environment usage, whenever the number of subscribers reached critical mass, an update to the framework enabled moving part of the ingestion and the architecture to an EMR cluster
- Another development was a metadata-driven “pub-sub” subscription model, which automatically approved requests into the metadata layer and started the feed to improve user experiences and make maintenance easier
- Additionally, a mirror stream complementing the current data-model-based system enabled true live streaming of data, fulfilled requests quickly and granted access to data that was not a part of the data model